Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i.cca: use 0 based array indexing #3239

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

marisn
Copy link
Contributor

@marisn marisn commented Nov 15, 2023

When ccmath library was introduced in 49b8066 most of arrays were moved to 0 based indexing but some locations got missed. Most likely it is the source of a crash reported a few years a go: https://trac.osgeo.org/grass/ticket/2297

With this fix I'm able to run the code without a segfault, but I haven't figured out how to get any output and thus no tests are provided.

When ccmath library was introduced in 49b8066
most of arrays were moved to 0 based indexing but some locations
got missed. Most likely it is the source of a crash reported a few years
a go: https://trac.osgeo.org/grass/ticket/2297
@marisn marisn added the C Related code is in C label Nov 15, 2023
@neteler neteler added the bug Something isn't working label Nov 15, 2023
@neteler neteler added this to the 8.4.0 milestone Nov 15, 2023
@marisn marisn marked this pull request as draft November 16, 2023 18:37
@marisn
Copy link
Contributor Author

marisn commented Nov 16, 2023

Although this PR fixes segfault, the code does not produce correct output (in comparison to pre- 49b8066 version). Unless the testcase passes, this PR does not improve on the current state.

cov[i][j][k] = cov[i][k][j] = sigs.sig[i - 1].var[j - 1][k - 1];
for (j = 0; j < bands; j++) {
mu[i][j] = sigs.sig[i].mean[j];
for (k = 0; k < j; k++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try

Suggested change
for (k = 0; k < j; k++) {
for (k = 0; k <= j; k++) {

because j is not the upper bound of k to be excluded but to be included, also with zero-based indexing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting my mistake!

Still even with this being fixed, output is not identical to the old version. Should it be identical? Suggestions how to test?

Copy link
Contributor

@metzm metzm Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions in stats.c assume zero-based indices, thus I would expect different results in the sense that results with your PR (zero based indices) are correct whereas previous results (one based indices) were not correct, particularly because the indexing to cov[][][] did not change.

You would need some independent, trusted alternative for Canonical components analysis in order to decide which results are correct. Regarding the code, your PR seems like a bugfix, previous results are probably not correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarification, cov has been indexed previously in main.c as cov[i][j][k] which was wrong for one-based i, j, k because zero-based i, j, k are used in stats.c. With your PR, cov is consistently indexed with zero-based i, j, k, that could explain the differences.

@wenzeslaus wenzeslaus added the errata Include in errata (fixes wrongly computed results) label Jan 5, 2024
@wenzeslaus wenzeslaus modified the milestones: 8.4.0, Future Apr 27, 2024
@github-actions github-actions bot added Python Related code is in Python module imagery tests Related to Test Suite labels Nov 28, 2024
@neteler
Copy link
Member

neteler commented Dec 3, 2024

Testing this PR I see that the component numbers in the warning differ from the generated output:

# GRASS nc_spm_08_grass7/landsat:~ >
i.cca group=landsat_group subgroup=landsat_subgroup    \
       ignature=landsat_signatures output=cca_component
Transform completed.
WARNING: The output cell map <cca_component.0> has values outside the 0-255
         range.
WARNING: The output cell map <cca_component.1> has values outside the 0-255
         range.
WARNING: The output cell map <cca_component.2> has values outside the 0-255
         range.

g.list raster pattern="cca_component*"
cca_component.1
cca_component.2
cca_component.3
cca_component.4

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@neteler neteler added the backport to 8.4 PR needs to be backported to release branch 8.4 label Dec 3, 2024
Comment on lines 29 to 37
cls.runModule("g.region", raster="lsat7_2000_20")
cls.group_name = tempname(10)
cls.subgroup_name = "vis"
cls.runModule(
"i.group",
group=cls.group_name,
subgroup=cls.subgroup_name,
input="lsat7_2000_20,lsat7_2000_30,lsat7_2000_40",
)
Copy link
Member

@neteler neteler Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cls.runModule("g.region", raster="lsat7_2000_20")
cls.group_name = tempname(10)
cls.subgroup_name = "vis"
cls.runModule(
"i.group",
group=cls.group_name,
subgroup=cls.subgroup_name,
input="lsat7_2000_20,lsat7_2000_30,lsat7_2000_40",
)
cls.runModule("g.region", raster="lsat7_2002_20")
cls.group_name = tempname(10)
cls.subgroup_name = "vis"
cls.runModule(
"i.group",
group=cls.group_name,
subgroup=cls.subgroup_name,
input="lsat7_2002_20,lsat7_2002_30,lsat7_2002_40",
)

There is an CI error: "b'ERROR: Raster map <lsat7_2000_20> not found\n'" - the available Landsat datasets are of 2002. Updated accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably affect the expected results, @marisn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markus, I have had no time (and am not certain if my math level is up to task) to go over formulas from the original paper (or was it a technical document?) to create a synthetic dataset with known results as I did for the r.kappa module. This is the reason why this PR has not been merged as at the moment is possible only to test if module runs but not if it produces correct output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, funny understood.
So I'd suggest to commit the data name change and see what happens next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it went further, till here:

 Running ./imagery/i.cca/testsuite/test_i_cca.py...
========================================================================
F
======================================================================
FAIL: test_output_values (__main__.OutputMatchTest.test_output_values)
Test correctness of i.cca output
----------------------------------------------------------------------
Traceback (most recent call last):
  File "imagery/i.cca/testsuite/test_i_cca.py", line 83, in test_output_values
    self.assertRasterMinMax(
  File "etc/python/grass/gunittest/case.py", line 539, in assertRasterMinMax
    self.fail(self._formatMessage(msg, stdmsg))
AssertionError: Wrong calculated value 
The actual maximum (14) is greater than the reference one (-3) for raster map tmp_oy5WA6oFO0.2 (with minimum -17)

@neteler
Copy link
Member

neteler commented Dec 4, 2024

Just for fun, here an attempt for a synthetic i.cca test case (surely to be improved to become a real test case):

Minimalistic i.cca Example with Synthetic Data

Set a simple region for your synthetic raster maps, creating a 10x10 grid with a resolution of 10 units:

g.region n=100 s=0 e=100 w=0 res=10 -p

Generate synthetic raster maps by using r.mapcalc to create input raster maps with varying patterns:

# map1: Random values between 0 and 100
r.mapcalc "map1 = rand(0, 100)" -s
# map2: X-coordinate values with added noise
r.mapcalc "map2 = x() + rand(0, 10)" -s
map3: Y-coordinate values with added noise
r.mapcalc "map3 = y() + rand(0, 10)" -s

r.info -r map=map1
r.info -r map=map2
r.info -r map=map3

d.mon wx0
d.rast map1
d.rast map2
d.rast map3

Create an imagery group and subgroup with these synthetic raster maps:

i.group group=synthetic_group subgroup=synthetic_subgroup \
        input=map1,map2,map3

Create synthetic training data by generating a simple vector map with labeled polygons for training. This creates:

  • Random training points (training_points)
  • Buffer zones around points (training_areas)
  • Three classes (class = 1 .. 3)
v.random output=training_points n=5 seed=1
v.db.addtable map=training_points columns="class INTEGER"
v.db.update map=training_points column=class value=cat
v.db.select map=training_points

v.buffer input=training_points output=training_areas distance=20 -t
v.db.select map=training_areas

v.db.update map=training_areas column=class value=1 where="cat <= 2"
v.db.update map=training_areas column=class value=2 where="cat == 3"
v.db.update map=training_areas column=class value=3 where="cat > 3"
v.db.select map=training_areas
d.vect training_areas -c

Rasterize the training areas:

v.to.rast input=training_areas output=training_areas use=attr attribute_column=class
r.info map=training_areas -r
r.category map=training_areas

d.erase
d.rast training_areas

Next generate a signature file from the synthetic training data:

i.gensig trainingmap=training_areas group=synthetic_group subgroup=synthetic_subgroup \
         signaturefile=synthetic_signatures

Perform Canonical Components Analysis with i.cca:

i.cca group=synthetic_group subgroup=synthetic_subgroup \
      signature=synthetic_signatures output=cca_comp

Display the results:

d.rast map=cca_comp.1
d.rast map=cca_comp.2
d.rast map=cca_comp.3

Check statistics of the output raster maps:

r.univar map=cca_comp.1
r.univar map=cca_comp.2
r.univar map=cca_comp.3

d.erase
d.rgb b=cca_comp.1 g=cca_comp.2 r=cca_comp.3

@marisn
Copy link
Contributor Author

marisn commented Dec 5, 2024

Markus, this still only tests if module runs and not if module runs correctly. The proper solution is to open http://dbwww.essc.psu.edu/lasdoc/user/canal.html, come up with some numbers, plug them into provided formulas and get results. Then they can be used to create a test case that validates GRASS code output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport to 8.4 PR needs to be backported to release branch 8.4 bug Something isn't working C Related code is in C errata Include in errata (fixes wrongly computed results) imagery module Python Related code is in Python tests Related to Test Suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants